Skip to content

Added option to build GPU libraries separately #4298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stemann
Copy link
Contributor

@stemann stemann commented Apr 16, 2025

In some cases, it might be preferable to have GPU support as an optional add-on feature, e.g., in Julia. This PR adds a cmake option to enable the GPU library, and the GPU C library, to be built as separate shared libraries.

This enables a consumer to dlopen (libfaiss_c, and) libfaiss, and if, e.g., CUDA is available to also load the CUDA (libfaiss_gpu_c, and) libfaiss_gpu.

With the current build set-up, where the GPU objects are statically linked into libfaiss, a consumer might dlopen the non-CUDA libfaiss_c, and in turn the non-CUDA libfaiss. Then when loading the CUDA libfaiss_c, the dynamic linker will re-use the non-CUDA libfaiss and fail to resolve CUDA/GPU symbols in libfaiss:

using Faiss # Loads the non-CUDA `libfaiss_c`

# Hackish load of the CUDA `libfaiss_c`
using CUDA_Runtime_jll
FaissCUDAExt = Base.get_extension(Faiss, :FaissCUDAExt) 

FaissCUDAExt.get_num_gpus()
julia: symbol lookup error: /root/.julia/artifacts/ae2dd872b785cdb30908414f39baf70c7c6aaa3c/lib/libfaiss_c.so: undefined symbol: _ZN5faiss3gpu13getNumDevicesEv

@stemann
Copy link
Contributor Author

stemann commented Apr 16, 2025

Current PR is based on 1.10 - should of course be rebased.

@stemann
Copy link
Contributor Author

stemann commented Apr 16, 2025

It might make more sense to name the libraries libfaiss_cuda and libfaiss_rocm than libfaiss_gpu - to enable a consumer to load both CUDA and ROCm libraries into the same process.

@mnorris11
Copy link

Sorry, I didn't quite get this part:

With the current build set-up, where the GPU objects are statically linked into libfaiss, a consumer might dlopen the non-CUDA libfaiss_c, and in turn the non-CUDA libfaiss. Then when loading the CUDA libfaiss_c, the dynamic linker will re-use the non-CUDA libfaiss and fail to resolve CUDA/GPU symbols in libfaiss:

Why can't the user just load the CUDA libfaiss_c if they are using GPU, i.e. skip the first step?

@stemann
Copy link
Contributor Author

stemann commented Apr 23, 2025

The reason the user can't just load the CUDA libfaiss_c is a combination of:

  • It is preferable to provide access to Faiss from Julia with CUDA as add-on functionality.
  • Julia is a dynamic environment, so there is not necessarily an ahead-of-time decision to not load the non-GPU libfaiss_c.

In addition, the preferred method for adding CUDA functionality is via Julia package extensions, where the Faiss (Julia) package may be extended, at run-time, with additional functionality, e.g., CUDA.

I.e., it should be enough to add the Faiss Julia package to the user environment, and then if the user also adds the CUDA (or CUDA_Runtime_jll) Julia package, the CUDA functionality will automatically be made available.

@stemann stemann marked this pull request as draft April 24, 2025 08:36
@mnorris11
Copy link

Hi @stemann sorry for the delay, was still discussing internally. But I see you have moved the PR to draft mode, so I will follow up after you publish again

@stemann
Copy link
Contributor Author

stemann commented Apr 24, 2025

No worries. Please do comment on the general idea. I only moved the PR to draft as it is a bit of a rough draft.

@mnorris11
Copy link

mnorris11 commented May 5, 2025

@stemann I am trying to see if making it just shared by default is workable and not having a static version, then we can just keep the 1 GPU flag instead of introducing a new one. If not, we can go with the approach in this PR.

@stemann
Copy link
Contributor Author

stemann commented May 5, 2025

Ah - that sounds good.

On a related note, I've also been toying around (also in JuliaPackaging/Yggdrasil#11057) with building the AVX2, and AVX512 versions, where it would actually be preferable to get a library with the same name, e.g. libfaiss.so, and just have the content be different depending on the targeted instruction set (e.g., plain x86_64, or avx2, avx512, ...). But no need to complicate this request/PR with that - that's a separate concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants